Skip to content

fs/buffer: serialize set_buffer_uptodate against concurrent clears#1305

Open
vfsci-bot[bot] wants to merge 1 commit intovfs.base.cifrom
pw/1087885/vfs.base.ci
Open

fs/buffer: serialize set_buffer_uptodate against concurrent clears#1305
vfsci-bot[bot] wants to merge 1 commit intovfs.base.cifrom
pw/1087885/vfs.base.ci

Conversation

@vfsci-bot
Copy link
Copy Markdown

@vfsci-bot vfsci-bot Bot commented Apr 30, 2026

Series: https://patchwork.kernel.org/project/linux-fsdevel/list/?series=1087885
Submitter: Chao Shi
Version: 2
Patches: 1/1
Message-ID: <20260430053645.1466196-1-coshi036@gmail.com>
Base: vfs.base.ci
Lore: https://lore.kernel.org/linux-fsdevel/20260430053645.1466196-1-coshi036@gmail.com


Automated by ml2pr

A WARN_ON_ONCE(!buffer_uptodate(bh)) in mark_buffer_dirty() is
reachable from the buffered write path on a block device when the
underlying device returns I/O errors at high density.  Reproduced
by fuzzing an NVMe controller (FEMU) that returns crafted error
completions for a sustained workload from /dev/nvme0n1.

The contract documented at set_buffer_uptodate() in
include/linux/buffer_head.h reads:

  Any other serialization (with IO errors or whatever that might
  clear the bit) has to come from other state (eg BH_Lock).

In fs/buffer.c, BH_Uptodate can be cleared from four I/O completion
callbacks: __end_buffer_read_notouch, end_buffer_write_sync,
end_buffer_async_read, end_buffer_async_write.

end_buffer_async_read() runs with BH_Lock held throughout, so its
clear is already serialized against any caller that also holds
BH_Lock around set_buffer_uptodate(); the call may in fact be
redundant, but addressing that is independent of this fix.

end_buffer_write_sync() likewise holds BH_Lock while it clears
BH_Uptodate on the write-error path.  Removing that clear would
change long-standing buffer-cache I/O-error semantics and is out
of scope here.

The race is therefore between block_commit_write() and
end_buffer_write_sync():

  CPU A: block_commit_write              CPU B: end_buffer_write_sync
         (folio lock held, BH_Lock NOT)         (BH_Lock held)
    set_buffer_uptodate(bh);
                                           clear_buffer_uptodate(bh);
                                           unlock_buffer(bh);
    mark_buffer_dirty(bh);  /* WARN */

CPU B observes the contract; CPU A does not.  With one side unlocked
the serialization is one-sided and ineffective: CPU A's set can be
immediately followed by CPU B's clear, tripping the WARN_ON_ONCE.
In the fuzzing reproducer, write-error completions are frequent
(visible as repeated "lost async page write" and per-LBA write
failures); buffer I/O completion callbacks on the write-error path
(e.g. end_buffer_write_sync, end_buffer_async_write) clear
BH_Uptodate while holding BH_Lock.

The bug is not benign: a not-uptodate buffer can be marked dirty
and subsequently written back; depending on whether the buffer was
fully or partially covered by the user write, this can leave on-disk
content that does not match the intended buffered write state.

Fix this by taking BH_Lock around set_buffer_uptodate() +
mark_buffer_dirty() in block_commit_write(), so both sides of the
contract use the documented serialization.

Found by FuzzNvme (Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant